Improving Lexical Alignment Using Hybrid Discriminative and Post-Processing Techniques

نویسندگان

  • Paulo Schreiner
  • Aline Villavicencio
  • Leonardo Zilio
  • Helena de Medeiros Caseli
چکیده

Automatic lexical alignment is a vital step for empirical machine translation, and although good results can be obtained with existent models (e.g. Giza++), more precise alignment is still needed for successfully handling complex constructions such as multiword expressions. In this paper we propose an approach for lexical alignment combining statistical and linguistic information. We describe the development of a baseline discriminative aligner and a set of language dependent post-processing functions that allow the inclusion of shallow linguistic knowledge. The post-processing functions were designed to significantly improve word alignment mainly on verb-particle constructs both over our baseline and over Giza++.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Addressing Problems across Linguistic Levels in SMT: Combining Approaches to Model Morphology, Syntax and Lexical Choice

Morphological complexity • Data sparsity due to uncovered inflected forms • Difficulty to produce the correct target-side inflection based on available information COMBINING APPROACHES • Pre-processing – syntactic level Source-side reordering (Gojun and Fraser, 2012) • At decoding time – lexical level Discriminative classifier to score translation rules using source-side context (Tamchyna et al...

متن کامل

JU_NLP at SemEval-2016 Task 11: Identifying Complex Words in a Sentence

The complex word identification task refers to the process of identifying difficult words in a sentence from the perspective of readers belonging to a specific target audience. This task has immense importance in the field of lexical simplification. Lexical simplification helps in improving the readability of texts consisting of challenging words. As a participant of the SemEval-2016: Task 11 s...

متن کامل

Towards Accurate and Efficient Chinese Part-of-Speech Tagging

From the perspective of structural linguistics, we explore paradigmatic and syntagmatic lexical relations for Chinese POS tagging, an important and challenging task for Chinese language processing. Paradigmatic lexical relations are explicitly captured by word clustering on largescale unlabeled data and are used to design new features to enhance a discriminative tagger. Syntagmatic lexical rela...

متن کامل

Using Cognates in a French-Romanian Lexical Alignment System: A Comparative Study

This paper describes a hybrid French Romanian cognate identification module. This module is used by a lexical alignment system. Our cognate identification method uses lemmatized, tagged and sentence-aligned parallel corpora. This method combines statistical techniques, linguistic information (lemmas, POS tags) and orthographic adjustments. We evaluate our cognate identification module and we co...

متن کامل

Image Segmentation using Improved Imperialist Competitive Algorithm and a Simple Post-processing

Image segmentation is a fundamental step in many of image processing applications. In most cases the image’s pixels are clustered only based on the pixels’ intensity or color information and neither spatial nor neighborhood information of pixels is used in the clustering process. Considering the importance of including spatial information of pixels which improves the quality of image segmentati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011